Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis

نویسندگان

  • John Dines
  • Hui Liang
  • Lakshmi Saheer
  • Matthew Gibson
  • William J. Byrne
  • Keiichiro Oura
  • Keiichi Tokuda
  • Junichi Yamagishi
  • Simon King
  • Mirjam Wester
  • Teemu Hirsimäki
  • Reima Karhila
  • Mikko Kurimo
چکیده

In this paper we present results of unsupervised cross-lingual speaker adaptation applied to text-to-speech synthesis. The application of our research is the personalisation of speech-to-speech translation in which we employ a HMM statistical framework for both speech recognition and synthesis. This framework provides a logical mechanism to adapt synthesised speech output to the voice of the user by way of speech recognition. In this work we present results of several different unsupervised and cross-lingual adaptation approaches as well as an end-to-end speaker adaptive speech-to-speech translation system. Our experiments show that we can successfully apply speaker adaptation in both unsupervised and cross-lingual scenarios and our proposed algorithms seem to ∗Corresponding author Email address: [email protected] (John Dines) Preprint submitted to Computer Speech and Language February 15, 2010 generalise well for several language pairs. We also discuss important future directions including the need for better evaluation metrics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Explorer Unsupervised cross - lingual speaker adaptation for HMM - based speech synthesis

In the EMIME project, we are developing a mobile device that performs personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. We integrate two techniques, unsupervised adaptation for HMM-based TTS using a wordbased large-vocabulary continuous speech recognizer...

متن کامل

Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping

In the EMIME project, we developed a mobile device that performs personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. We integrated two techniques into a single architecture: unsupervised adaptation for HMM-based TTS using word-based large-vocabulary contin...

متن کامل

Personalising Speech-To-Speech Translation in the EMIME Project

In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in TTS (text-to-speech) using the recognized voice in ASR (automatic speech recognition). An important application for this research is personalised spee...

متن کامل

State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis

A phone mapping-based method had been introduced for cross-lingual speaker adaptation in HMM-based speech synthesis. In this paper, we continue to propose a state mapping based method for cross-lingual speaker adaptation, where the state mapping between voice models in source and target languages is established under minimum Kullback-Leibler divergence (KLD) criterion. We introduce two approach...

متن کامل

HMM-based polyglot speech synthesis by speaker and language adaptive training

This paper describes a technique for speaker and language adaptive training (SLAT) for HMM-based polyglot speech synthesis and its evaluations on a multi-lingual speech corpus. The SLAT technique allows multi-speaker/multi-language adaptive training and synthesis to be performed. Experimental results show that the SLAT technique achieves better naturalness than both speaker-adaptively trained l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer Speech & Language

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2013